Search CORE

King's Research Portal

Smarter Vaccine Design Will Circumvent Regulatory T Cell-Mediated Evasion in Chronic HIV and HCV Infection

Author: Andres H. Gutierrez
Anne Searls De Groot
Anne Searls De Groot
Chris eBailey-Kellogg
Frances eTerry
Leonard eMoise
Leonard eMoise
Phyllis eLosikoff
Ryan eTassone
Stephen H. Gregory
William D Martin
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2014
Field of study

Despite years of research, vaccines against HIV and HCV are not yet available, due largely to effective viral immunoevasive mechanisms. A novel escape mechanism observed in viruses that cause chronic infection is suppression of viral-specific effector CD4(+) and CD8(+) T cells by stimulating regulatory T cells (Tregs) educated on host sequences during tolerance induction. Viral class II MHC epitopes that share a T cell receptor (TCR)-face with host epitopes may activate Tregs capable of suppressing protective responses. We designed an immunoinformatic algorithm, JanusMatrix, to identify such epitopes and discovered that among human-host viruses, chronic viruses appear more human-like than viruses that cause acute infection. Furthermore, an HCV epitope that activates Tregs in chronically infected patients, but not clearers, shares a TCR-face with numerous human sequences. To boost weak CD4(+) T cell responses associated with persistent infection, vaccines for HIV and HCV must circumvent potential Treg activation that can handicap efficacy. Epitope-driven approaches to vaccine design that involve careful consideration of the T cell subsets primed during immunization will advance HIV and HCV vaccine development

Frontiers - Publisher Connector

Dartmouth Digital Commons (Dartmouth College)

DigitalCommons@URI

Bounded prefix-suffix duplication

Author: A. Ehrenfeucht
D. Gusfield
D. Knuth
D.B. Searls
D.P. Bovet
J. Dassow
J. Kärkkäinen
M. Crochemore
M. Frazier
M.-W. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider a restricted variant of the prefix-suffix duplication operation, called bounded prefix-suffix duplication. It consists in the iterative duplication of a prefix or suffix, whose length is bounded by a constant, of a given word. We give a sufficient condition for the closure under bounded prefix-suffix duplication of a class of languages. Consequently, the class of regular languages is closed under bounded prefix-suffix duplication; furthermore, we propose an algorithm deciding whether a regular language is a finite k-prefix-suffix duplication language. An efficient algorithm solving the membership problem for the k-prefix-suffix duplication of a language is also presented. Finally, we define the k-prefix-suffix duplication distance between two words, extend it to languages and show how it can be computed for regular languages

arXiv.org e-Print Archive

Archivo Digital UPM

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

University of Salford Institutional Repository

FigShare

Are grammatical representations useful for learning from biological sequence data?— a case study

Author: A. Srinivasan
A. Whittaker
C. Rawlings
C.H. Bryant
Ling C.
S. Topp
S.H. Muggleton
Searls D.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/10/2001
Field of study

This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, five of the co-authors of this paper, have extensive expertise on NPPs and general bioinformatics methods. Their motivation for generating a NPP grammar was that none of the existing bioinformatics methods could provide sufficient cost-savings during the search for new NPPs. Prior to this project experienced specialists at SmithKline Beecham had tried for many months to hand-code such a grammar but without success. Our best predictor makes the search for novel NPPs more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the ILP Bayesian approach to learning from positive examples. A group of features is derived from this grammar. Other groups of features of NPPs are derived using other learning strategies. Amalgams of these groups are formed. A recognition model is generated for each amalgam using C4.5 and C4.5rules and its performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives

Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems

Author: Dayhoff M.O.
Korf I.
Kulp D.
Lior Pachter
Marina Alexandersson
Müller T.
Searls D.B.
Simon Cawley
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Author: BFJ Manly
CI Castillo-Davis
David Johnson
DB Searls
DB Searls
DD Womble
E Badidi
F Antequera
J Krueger
J Theilhaber
JD Wren
JD Wren
JF Costello
JM Claverie
Jonathan D Wren
JR Quinlan
K Davies
K Nakai
L Stein
Le Gruenwald
LV Zhang
M Ashburner
M Gardiner-Garden
M Safran
P Clark
RS Michalski
S Foissac
S Muggleton
SP Shah
TV Venkatesh
V Bajic
W Frawley
WM Shui
WM Shui
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands

Springer - Publisher Connector

Genoviz Software Development Kit: Java tool kit for building genomics visualization applications

Author: AE Loraine
Ann E Loraine
BJ Haas
Cyrus Harmon
D Huntley
DB Searls
Ed Erwin
EL Sonnhammer
Eric Blossom
GA Helt
Gregg A Helt
John W Nicol
JW Nicol
MS Cline
NL Harris
P Aldhous
RC Holland
S Fischer
S Hoon
Stephen A Chervitz
Steven G Blanchard
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Visualization software can expose previously undiscovered patterns in genomic data and advance biological science. Results The Genoviz Software Development Kit (SDK) is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK framework provides a mechanism for incorporating adaptive, dynamic zooming into applications, a desirable feature of genome viewers. Visualization capabilities of the Genoviz SDK include automated layout of features along genetic or genomic axes; support for user interactions with graphical elements (Glyphs) in a map; a variety of Glyph sub-classes that promote experimentation with new ways of representing data in graphical formats; and support for adaptive, semantic zooming, whereby objects change their appearance depending on zoom level and zooming rate adapts to the current scale. Freely available demonstration and production quality applications, including the Integrated Genome Browser, illustrate Genoviz SDK capabilities. Conclusion Separation between graphics components and genomic data models makes it easy for developers to add visualization capability to pre-existing applications or build new applications using third-party data models. Source code, documentation, sample applications, and tutorials are available at <url>http://genoviz.sourceforge.net/</url>.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars

Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology